Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available June 1, 2026
-
Trajectory inference methods are essential for analyzing the developmental paths of cells in single-cell sequencing datasets. It provides insights into cellular differentiation, transitions, and lineage hierarchies, helping unravel the dynamic processes underlying development and disease progression. However, many existing tools lack a coherent statistical model and reliable uncertainty quantification, limiting their utility and robustness. In this paper, we introduce VITAE (Variational Inference for Trajectory by AutoEncoder), a statistical approach that integrates a latent hierarchical mixture model with variational autoencoders to infer trajectories. The statistical hierarchical model enhances the interpretability of our framework, while the posterior approximations generated by our variational autoencoder ensure computational efficiency and provide uncertainty quantification of cell projections along trajectories. Specifically, VITAE enables simultaneous trajectory inference and data integration, improving the accuracy of learning a joint trajectory structure in the presence of biological and technical heterogeneity across datasets. We show that VITAE outperforms other state-of-the-art trajectory inference methods on both real and synthetic data under various trajectory topologies. Furthermore, we apply VITAE to jointly analyze three distinct single-cell RNA sequencing datasets of the mouse neocortex, unveiling comprehensive developmental lineages of projection neurons. VITAE effectively reduces batch effects within and across datasets and uncovers finer structures that might be overlooked in individual datasets. Additionally, we showcase VITAEās efficacy in integrative analyses of multiomic datasets with continuous cell population structures.more » « less
-
Abstract Generalized cross-validation (GCV) is a widely used method for estimating the squared out-of-sample prediction risk that employs scalar degrees of freedom adjustment (in a multiplicative sense) to the squared training error. In this paper, we examine the consistency of GCV for estimating the prediction risk of arbitrary ensembles of penalized least-squares estimators. We show that GCV is inconsistent for any finite ensemble of size greater than one. Towards repairing this shortcoming, we identify a correction that involves an additional scalar correction (in an additive sense) based on degrees of freedom adjusted training errors from each ensemble component. The proposed estimator (termed CGCV) maintains the computational advantages of GCV and requires neither sample splitting, model refitting, or out-of-bag risk estimation. The estimator stems from a finer inspection of the ensemble risk decomposition and two intermediate risk estimators for the components in this decomposition. We provide a non-asymptotic analysis of the CGCV and the two intermediate risk estimators for ensembles of convex penalized estimators under Gaussian features and a linear response model. Furthermore, in the special case of ridge regression, we extend the analysis to general feature and response distributions using random matrix theory, which establishes model-free uniform consistency of CGCV.more » « less
An official website of the United States government
